Lexicon management and standard formats
نویسنده
چکیده
International standards for lexicon formats are in preparation. To a certain extent, the proposed formats converge with prior results of standardization projects. However, their adequacy for (i) lexicon management and (ii) lexicon-driven applications have been little debated in the past, nor are they as a part of the present standardization effort. We examine these issues. IGM has developed XML formats compatible with the emerging international standards, and we report experimental results on large-coverage lexica.
منابع مشابه
Towards a Generic Architecture for Lexicon Management
In this paper we propose an architecture for a lexicon management tool MANAGELEX. This tool aims at a general environment for reading, updating and combining lexicons in different formats. The starting point is the already existing lexicon models MULTILEX and GENELEX. Each functionality (reading, updating and combining) is based on a corresponding model, which can be configured and maintained c...
متن کاملContinuous speech recognition in the WAXHOLM dialogue system
This paper presents the status of the continuous speech recognition engine of the WAXHOLM project. The engine is a software only system written in portable C code. The design is flexible and different modes for phonetic pattern matching are available. In particular, artificial neural networks and standard multiple Gaussian mixtures are implemented for phone probability estimation, and for resea...
متن کاملComputer Tools for the Management of Lexicon-Grammar Databases
Lexicon grammar is a systematic method for the analysis and the representation of the elementary sentence structures of a natural language; its product: large collections of syntactic electronic dictionaries or lexicon-grammar tables (LGTs). In order to describe a language, very long term collaborative work is required. However, the current computer tools for the management of LGTs do not fulfi...
متن کاملA Computational Lexicon Of Portuguese For Automatic Text Parsing
Using standard methods and formats established at LADL, and adopted by several European research teams to construct largecoverage electronic dictionaries and grammars, we elaborated for Portuguese a set of lexlcal resources, that were implemented in IN'rEX We describe the main features of such linguistic data, refer to their mmntenance and extension, and gwe different examples of automatic text...
متن کاملOutilex, plate-forme logicielle de traitement de textes écrits
The Outilex software platform, which will be made available to research, development and industry, comprises software components implementing all the fundamental operations of written text processing : processing without lexicons, exploitation of lexicons and grammars, language resource management. All data are structured in XML formats, and also in more compact formats, either readable or bina...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/0711.3449 شماره
صفحات -
تاریخ انتشار 2005